Zipf's and Benford's laws in Twitter hashtags

نویسندگان

  • José Alberto Pérez-Melián
  • J. Alberto Conejero
  • César Ferri
چکیده

Social networks have transformed communication dramatically in recent years through the rise of new platforms and the development of a new language of communication. This landscape requires new forms to describe and predict the behaviour of users in networks. This paper presents an analysis of the frequency distribution of hashtag popularity in Twitter conversations. Our objective is to determine if these frequency distribution follow some well-known frequency distribution that many real-life sets of numerical data satisfy. In particular, we study the similarity of frequency distribution of hashtag popularity with respect to Zipf’s law, an empirical law referring to the phenomenon that many types of data in social sciences can be approximated with a Zipfian distribution. Additionally, we also analyse Benford’s law, is a special case of Zipf’s law, a common pattern about the frequency distribution of leading digits. In order to compute correctly the frequency distribution of hashtag popularity, we need to correct many spelling errors that Twitter’s users introduce. For this purpose we introduce a new filter to correct hashtag mistake based on string distances. The experiments obtained employing datasets of Twitter streams generated under controlled conditions show that Benford’s law and Zipf’s law can be used to model hashtag frequency distribution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing Twitter and #Hashtags Toward Enhancing Student Learning in an Online Course Environment

The authors offer an answer to the research question, To what extent and in what ways is Twitter helpful to student learning when group hashtags are created and used in collaborative educational environments? Sixty-two students in a spring 2012 graduate online Research Methodology course worked individually and in groups to create discussions on topics of interest through Twitter posts and stud...

متن کامل

Exploring Twitter Hashtags

Twitter messages often contain so-called hashtags to denote keywords related to them. Using a dataset of 29 million messages, I explore relations among these hashtags with respect to co-occurrences. Furthermore, I present an attempt to classify hashtags into five intuitive classes, using a machine-learning approach. The overall outcome is an interactive Web application to explore Twitter hashtags.

متن کامل

Recommending #-Tags in Twitter

Twitter, currently the most popular microblogging tool available, is used to publish more than 140,000,000 messages a day. Many users use hashtags to categorize their tweets. However, hashtags are not restricted in any way in terms of usage or syntax which leads to a very heterogeneous set of hashtags occurring in the Twitter universe and therefore, decreases the search capabilities. In this pa...

متن کامل

On the Real-time Prediction Problems of Bursting Hashtags in Twitter

Hundreds of thousands of hashtags are generated every day on Twitter. Only a few become bursting topics. Among the few, only some can be predicted in real-time. In this paper, we take the initiative to conduct a systematic study of a series of challenging real-time prediction problems of bursting hashtags. Which hashtags will become bursting? If they do, when will the burst happen? How long wil...

متن کامل

Suggesting Hashtags on Twitter

As micro-blogging sites, like Twitter, continue to grow in popularity, we are presented with the problem of how to effectively categorize and search for posts. Looking specifically at Twitter, we see that users may categorize their posts using hashtags, and any word or phrase may be used as the category. Attempting to search for tweets about Facebook, a user would need to try many different has...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017